AI Document Translate on PDF loses a lot of formating and text color

Joachim Albertsson 45 Reputation points
2024-05-14T22:02:34.04+00:00

Hi, I'm building an Azure Frontend-Backend-AI Translate App.

Added the DocumentTranslation part using the DocumentTranslationClient on PDFs it works quite ok on technical document with less design build in.

But when trying on marketing material it loses a lot of format example:

  1. White text on blue bg -> Black text on blue gb
  2. More stylish bullets (grey squares) -> normal round black bullets
  3. Line spacing is uneven in some areas compared to original
  4. Company font lost and replaced by similar font
  5. Pictures in a text flow are on the translated text (click <pic> to open)
  6. Copyright sign are lost
  7. Tabs are sometimes lost (ex. 6 <tab> Cleaning -> 6 Rengörning)

Is there some way to do anything about this?

Small code sample:

    
client = DocumentTranslationClient(endpoint, AzureKeyCredential(key))
poller = client.begin_translation(source_blob_url, target_blob_url, "es", 	storage_type="File")
result = poller.result()

(using azure-ai-translation-document 1.0.0)

Azure AI Language
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.
365 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Joachim Albertsson 45 Reputation points
    2024-05-20T22:25:16.72+00:00

    After some testing it seems like using non-standard font such as company build fonts causes most of format issues. Using standard fonts fixed most issues - though not all.

    0 comments No comments